autovi: Automated Assessment of Residual Plots Using Computer Vision

ECSSC2024

Weihao (Patrick) Li

Monash University

🤔Challenges

  • Vertical spread of the points varies with the fitted values indicates the existence of heteroskedasticity.

  • However, this is an over-interpretation.

  • The visual pattern is caused by a skewed distribution of the predictor.

🔬Null residual plots

  • Assume the fitted model is correctly specified and simulate residuals from it.
  • Left-triangle shape is relatively common.

autovi Package

The autovi package provides automated assessment of residual plot with computer vision models.

It estimates visual signal strength (VSS), which quantifies the disparity between the actual residual distribution and the reference distribution, based on KL divergence.

Core Methods

  • Null residuals simulation: rotate_resid()
  • Visual signal strength: vss()
  • Comprehensive checks: check() and summary_plot()

💡Example: Boston Housing

fitted_model <- lm(MEDV ~ RM + LSTAT + PTRATIO, data = housing)
ggplot() +
  geom_point(aes(fitted(fitted_model), 
                 resid(fitted_model))) +
  theme_void()

rotate_resid()

Null residuals are simulated from the fitted model assuming it is correctly specified.

checker <- residual_checker(fitted_model)
checker$rotate_resid()
# A tibble: 489 × 2
   .fitted   .resid
     <dbl>    <dbl>
 1 632372.   -3870.
 2 525177. -145487.
 3 646753.    5602.
 4 624848.  122366.
 5 611817.  -12470.
 6 551051.  -45186.
 7 504757. -144455.
 8 445700.   70620.
 9 281912.   26909.
10 453398.  -86980.
# ℹ 479 more rows
checker$rotate_resid() |>
  checker$plot_resid()

vss()

Visual signal strength of the actual residual plot

checker$vss()
✔ Predict visual signal strength for 1 image.
# A tibble: 1 × 1
    vss
  <dbl>
1  6.48

Visual signal strength of a null plot

checker$rotate_resid() |>
  checker$vss()
✔ Predict visual signal strength for 1 image.
# A tibble: 1 × 1
    vss
  <dbl>
1  1.24

check()

checker$check()
── <AUTO_VI object>
Status:
 - Fitted model: lm
 - Keras model: (None, 32, 32, 3) + (None, 5) -> (None, 1)
    - Output node index: 1
 - Result:
    - Observed visual signal strength: 6.484 (p-value = 0)
    - Null visual signal strength: [100 draws]
       - Mean: 1.169
       - Quantiles: 
          ╔══════════════════════════════════════════╗
          ║  25%   50%   75%   80%   90%   95%   99% ║
          ║1.037 1.120 1.231 1.247 1.421 1.528 1.993 ║
          ╚══════════════════════════════════════════╝
    - Bootstrapped visual signal strength: [100 draws]
       - Mean: 6.28 (p-value = 0)
       - Quantiles: 
          ╔══════════════════════════════════════════╗
          ║  25%   50%   75%   80%   90%   95%   99% ║
          ║5.960 6.267 6.614 6.693 6.891 7.112 7.217 ║
          ╚══════════════════════════════════════════╝
    - Likelihood ratio: 0.7064 (boot) / 0 (null) = Extremely large 

summary_plot()

checker$summary_plot()

💡Example: Left-triangle

Breusch–Pagan test \(p\)-value = 0.0457

💡Example: Dinosaur

Ramsey Regression Equation Specification Error test \(p\)-value = 0.742

Breusch–Pagan test \(p\)-value = 0.36

Shapiro-Wilk test \(p\)-value = 9.21e-05

🌐Shiny Application

Don’t want to install TensorFlow?

Try our shiny web application: https://autoviweb.patrickli.org

Thanks! Any questions?


tengmcing

patrick.li@monash.edu

📦 autovi

📜 Slides